Undoubtedly, we’ve all heard about “flattening the curve”: the goal of slowing the rate of new COVID-19 infections by limiting person-to-person contacts. A related question is what the curve even looks like right now. Is it accelerating or slowing down? When is the peak expected to occur? Since data collection became more systematic, we are in a position to view these questions through the lens of values observed so far.

Below, I apply simple curve fitting to model new COVID-19 cases in the U.S. over time. The data is aggregated from three different sources (see below). I chose to fit a log-normal distribution, which is traditionally used by epidemilogists to study cases by date of onset. The curve has an asymmetric bell-like shape with a rapid increase in new cases at the start, followed by a more gradual decrease over time.

After fitting the curve, I use it to plot projections for five days in the future. These predictions are a pet project to satisfy my own curiosity and should be taken with a grain of salt, since they don’t account for the many factors at play in a rapidly-changing situation. With that said, I aim to re-run the script daily to fit new data points as they become available.

The plot is interactive and best viewed on a computer, where you can hover a mouse pointer over individual points, as well as click and drag to pan and zoom around. The functionality is more limited on a mobile screen and varies from device to device. On some devices, the plotting area prevents you from scrolling with your finger; swipe along the edges of the plot to scroll instead.


Data Sources

The observed data is aggregated across three data sources: John Hopkins University (JHU), The COVID Tracking Project (CTP), and The New York Times (NYT). The reason for aggregation is to smooth out any small discrepancies in reporting. For example, in the plot below you may notice that the number of cases was under-reported for Mar 18th and over-reported for Mar 19th by JHU, relative to the other two data sources. (This is likely due to time zone differences.) To reduce the effect of such artifacts, the curve is fit to the median values computed for each date.


References: [JHU Data][CTP Data] [NYT Data][Code]